-
-
Notifications
You must be signed in to change notification settings - Fork 2
Add quotation analysis to NMT builds #718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
86f3543 to
57f706d
Compare
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 41 of 41 files at r1, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @Enkidu93)
src/Serval/src/Serval.Shared/Models/CorpusAnalysis.cs line 5 at r1 (raw file):
public record CorpusAnalysis { public required string CorpusRef { get; init; }
This should be ParallelCorpusRef so that the naming is consistent with the PretranslateCorpus model. Do we need to continue to support the obsolete Corpus data model with this new feature? I would prefer not to.
src/Serval/src/Serval.Grpc/Protos/serval/translation/v1/platform.proto line 77 at r1 (raw file):
} message UpdateCorpusAnalysisRequest {
I would rename this and the associated result class to include Parallel so that it is clear that this is referring to a parallel corpus and not a monolingual corpus.
src/ServiceToolkit/src/SIL.ServiceToolkit/Services/ParallelCorpusPreprocessingService.cs line 8 at r1 (raw file):
private const int Seed = 1234; public async Task AnalyseCorporaAsync(
Could this method just return the conventions rather than accept a delegate? We could just run this on a single corpus to simplify the return value.
src/Machine/src/Serval.Machine.Shared/Services/TranslationPreprocessBuildJob.cs line 124 at r1 (raw file):
{ List<CorpusAnalysis> corpusAnalysis = []; await ParallelCorpusPreprocessingService.AnalyseCorporaAsync(
This is only needed for NMT jobs. I think this should be implemented in NmtPreprocessBuildJob.
pmachapman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @ddaspit and @Enkidu93)
src/Machine/src/Serval.Machine.Shared/Services/TranslationPreprocessBuildJob.cs line 124 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
This is only needed for NMT jobs. I think this should be implemented in
NmtPreprocessBuildJob.
Done. To do this, I needed to change the base method to be virtual instead of `abstract.
src/Serval/src/Serval.Grpc/Protos/serval/translation/v1/platform.proto line 77 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
I would rename this and the associated result class to include
Parallelso that it is clear that this is referring to a parallel corpus and not a monolingual corpus.
Done.
src/Serval/src/Serval.Shared/Models/CorpusAnalysis.cs line 5 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
This should be
ParallelCorpusRefso that the naming is consistent with thePretranslateCorpusmodel. Do we need to continue to support the obsoleteCorpusdata model with this new feature? I would prefer not to.
Done,
src/ServiceToolkit/src/SIL.ServiceToolkit/Services/ParallelCorpusPreprocessingService.cs line 8 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
Could this method just return the conventions rather than accept a delegate? We could just run this on a single corpus to simplify the return value.
Done.
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 26 of 26 files at r2, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)
src/Serval/src/Serval.Translation/Services/TranslationPlatformServiceV1.cs line 305 at r2 (raw file):
u => { if (request.ParallelCorpusAnalysis.Count > 0)
I'm not sure exactly what happens if no update ops are called. It might result in an unnecessary Mongo call. Just to be safe, it would make sense to wrap the entire UpdateAsync call in this check.
src/Serval/src/Serval.Translation/Services/TranslationPlatformServiceV1.cs line 312 at r2 (raw file):
.ParallelCorpusAnalysis.Select(a => new ParallelCorpusAnalysis { ParallelCorpusRef = a.ParallelCorpusId,
Is SF still using the obsolete Corpus model? If the ParallelCorpusId is actually an obsolete Corpus id, then setting ParallelCorpusRef to ParallelCorpusId would be incorrect. To avoid this, we could check that any of the ParallelCorpusId values exists in Pretranslate.ParallelCorpusRef in the UpdateAsync filter. This would be a way of ensuring that the build is using the newer ParallelCorpus model. If it is using the obsolete model, then it simply wouldn't set the analysis.
d33e032 to
b7340fe
Compare
pmachapman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 39 of 43 files reviewed, 2 unresolved discussions (waiting on @ddaspit and @Enkidu93)
src/Serval/src/Serval.Translation/Services/TranslationPlatformServiceV1.cs line 305 at r2 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
I'm not sure exactly what happens if no update ops are called. It might result in an unnecessary Mongo call. Just to be safe, it would make sense to wrap the entire
UpdateAsynccall in this check.
Done. Thank you!
src/Serval/src/Serval.Translation/Services/TranslationPlatformServiceV1.cs line 312 at r2 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
Is SF still using the obsolete
Corpusmodel? If theParallelCorpusIdis actually an obsoleteCorpusid, then settingParallelCorpusReftoParallelCorpusIdwould be incorrect. To avoid this, we could check that any of theParallelCorpusIdvalues exists inPretranslate.ParallelCorpusRefin theUpdateAsyncfilter. This would be a way of ensuring that the build is using the newerParallelCorpusmodel. If it is using the obsolete model, then it simply wouldn't set the analysis.
Done. SF not using the obsolete Corpus model. I have updated the logic to check against the engine's list of parallel corpora.
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 16 of 41 files at r1, 23 of 26 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ddaspit)
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r3, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @pmachapman)
b7340fe to
996d95b
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #718 +/- ##
==========================================
- Coverage 66.27% 66.27% -0.01%
==========================================
Files 364 368 +4
Lines 19450 19588 +138
Branches 2494 2505 +11
==========================================
+ Hits 12891 12981 +90
- Misses 5648 5695 +47
- Partials 911 912 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Resolves #665.
Requires sillsdev/machine/pull/316.
This change is